D1270-D1277 Nucleic Acids Research, 2012, Vol. 40, Database issue 
doi:10.1093/nar/gkr880 



Published online 7 November 2011 



EcoliWiki: a wiki-based community resource 
for Escherichia coli 

Brenley K. Mcintosh 1 , Daniel P. Renfro 1 , Gwendowlyn S. Knapp 1 , 

Chanchala R. Lairikyengbam 2 , Nathan M. Liles 1 , Lili Niu 1 , Amanda M. Supak 1 , 

Anand Venkatraman 1 , Adrienne E. Zweifel 1 , Deborah A. Siegele 3 and James C. Hu 1 '* 

department of Biochemistry and Biophysics, Texas Agrilife Research, 2 Department of Computer Science and 
Engineering and 3 Department of Biology, Texas A&M University College Station, TX 77843, USA 

Received August 14, 2011; Revised September 28, 2011; Accepted September 29, 2011 



ABSTRACT 

EcoliWiki is the community annotation component of 
the PortEco (http://porteco.org; formerly EcoliHub) 
project, an online data resource that integrates in- 
formation on laboratory strains of Escherichia coli, 
its phages, plasmids and mobile genetic elements. 
As one of the early adopters of the wiki approach to 
model organism databases, EcoliWiki was designed 
to not only facilitate community-driven sharing of 
biological knowledge about E. coli as a model organ- 
ism, but also to be interoperable with other data re- 
sources. EcoliWiki content currently covers genes 
from five laboratory £. coli strains, 21 bacteriophage 
genomes, F plasmid and eight transposons. 
EcoliWiki integrates the Mediawiki wiki platform with 
other open-source software tools and in-house soft- 
ware development to extend how wikis can be used 
for model organism databases. EcoliWiki can be 
accessed online at http://ecoliwiki.net. 

INTRODUCTION 

Laboratory Escherichia coli strains form the basis of much 
of our fundamental understanding of the molecular and 
genetic basis of life. As a central model system, E. coli has 
been either the primary focus or a key component for 
many bioinformatics data resources that cover overlapp- 
ing but distinct aspects of E. coli biology. Many existing re- 
sources provide encyclopedic information about genes, gene 



products, transcripts and regulons of E. coli K-12 (1-6). 
Nevertheless, these resources only cover a fraction of the 
E. coli knowledge base wanted by biologists working with 
E. coli, which includes not only those interested in the 
biology of E. coli per se, but also many more using 
E. coli as a platform for a wide range of basic research 
and biotechnology. 

We designed EcoliWiki (http://ecoliwiki.net) to facili- 
tate community-driven sharing of biological knowledge 
of the model organism E. coli. First implemented in 
2007, EcoliWiki is one of the early adopters of the wiki 
approach to model organism databases. Although it has 
been mentioned and cited in several reports and descrip- 
tions of other wiki-based projects (7-10), this report is the 
first comprehensive description of EcoliWiki. 

One of our primary objectives for EcoliWiki is to 
capture community-contributed information about la- 
boratory strains, phages, plasmids and so forth. Data 
about E. coli are being continually generated through bio- 
informatics, proteomic, genomic, biochemical and genetic 
research. This massive onslaught of data can be difficult 
to manage and integrate, though its importance to other 
scientists cannot be understated. It is crucial that the in- 
formation being produced is interwoven with exist- 
ing knowledge in an easily accessible and searchable 
resource. Linking all of this information to the relevant 
gene as well as to publications is vital for identifying 
or predicting the function of the gene product not only 
in E. coli, but also for orthologous gene products. 
EcoliWiki is designed with this linking and interoperabil- 
ity in mind. 
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ECOLIWIKI CONTENT 

EcoliWiki houses information in >64000 content pages. 
As in other MediaWiki-based wikis, EcoliWiki content 
pages are paired with talk pages where users can discuss 
or debate content. EcoliWiki also extensively uses 
Media Wiki's system of placing pages in Categories, and 
the ability to nest Categories and Subcategories. Category 
terms increase the usability of the wiki for both users and 
automated processes. 

Gene-centric pages 

EcoliWiki contains groups of pages for genes from the 
E. coli lab strains MG1655 (11,12), W3110 (11), DH10B 
(13), BW2952 (14) and REL606 (15). Where a canonical 
gene name is available from EcoGene (5), it is used as the 
basis for the page names. Identifiers and non-canonical 
gene names are treated as synonyms that redirect to the 
gene, or to a disambiguation page when the same synonym 
is used for more than one gene. Orthologs from different 
strains are grouped together so that their annotations are 
made in a shared location. We used BLAST (16) to identify 
orthologs based on homology within the gene and in its 
flanking sequences. Note, however, that this method relies 
on orthologs being annotated as features in the GenBank 
or RefSeq files we use as sources. Orthologs that are 
present in a genome, but not annotated, will be missed. 
Improved projection of annotations between genomes is 
needed and is planned for the future. Some genes from the 
E. coli B strain REL606 share gene names with non- 
homologous genes in MG1655 (15). For example, two 
blocks of genes involved in LPS synthesis in MG1655 
and REL606 are in similar locations. Several of these 
have common waa and wbb names although they do not 
have significant sequence similarity. To distinguish these 
from the K-12 genes, we give these genes from E. coli B 
their own sets of gene pages with page titles prefixed with 
E_coli_B. Other genes from REL606 that are not present 
in the K-12 strains do not have canonical gene names 
based on the Demerec system (17). In these cases, we 
base the gene names on the ECB locus tags (15). 

In addition to E. coli genomes, we have gene-centric 
pages for the F plasmid, 21 different bacteriophage gen- 
omes and eight transposons (Table 1). These genes were 
imported from GenBank and RefSeq records, where avail- 
able. The genome of bacteriophage phi80 and some of the 
transposon sequences were extracted from other sequence 
records. These pages are named with prefixes indicating 
the source genome, e.g. Phage_lambda_cI:Quickview. 

Every gene in EcoliWiki is associated with six wiki 
pages linked by formatting that mimics tabs: Quickview, 
Gene, Gene Product (s) , Expression, Evolution and On One 
Page. Each of the first five contains pertinent information 
about that aspect of the gene, and they all contain editable 
tables and notes. Each of these pages has a references 
section and a list of categories the page belongs to. 
EcoliWiki pages include links to other databases including 
EcoCyc (3), EcoGene (5), EchoBase (4), ASAP (2), 
EcoliGenExpDB (http://genexpdb.ou.edu/main/), RefSeq 
(18), UniProt (19), Pfam (20), Brenda (21), SwissModel 
(22), and ModBase (23). References are automatically 



Table 1. Sources of genes listed on EcoliWiki 
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Transposon Tn3 
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generated using a modified version of the Cite extension 
(see below). The On One Page view catenates content from 
the other five for users who prefer scrolling to clicking 
between tabs. Figure 1 shows some of the elements of 
one of the gene-centric pages. 

The Quickview page provides users with a brief summary 
of data in the form of a table that dynamically updates 
from tables on the other associated pages for the gene. The 
Quickview table also contains links to the DNA and pro- 
tein sequences for the gene, literature searches and other 
database searches. Below the table, a Notes area is pro- 
vided for general information. The Gene, Gene Product (s), 
Expression and Evolution pages consist of multiple sections 
containing tables and Notes fields to allow capture of 
much more detailed information. 

Information on the Gene page includes gene synonyms, 
mutant alleles and their availability, genetic interactions, 
and a set of images from GBrowse (25) showing the gen- 
omic context of the gene in each strain where it is found. 
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Figure 1. Example Gene-centric page for PhoQ product(s). (A) Overall page, showing the structure of coliWiki pages with tables (red arrows), 
figures, jmol viewer (blue arrow) and text notes (yellow arrows). (B-E) Show expanded views of page areas. (B) Simulated tabs for navigation 
between pages related to PhoQ, and links to page sections. (C) User-editable table for GO annotations for PhoQ. This table is periodically 
repopulated with annotations from EcoCyc. (D) Domains and motifs table and associated figures. The motifs diagram is generated from the 
content of the domains table to its left, and from the alleles and phenotypes table on the PhoQ:Gene page. The TMHMM (24) diagram is 
automatically generated for all genes. (E) References and categories. References are automatically generated from a software extension that recog- 
nizes PMIDs in the tables and embedded via markup in the text notes sections. 



The Gene Product (s) page includes information about the 
protein or RNA. A major focus of EcoiiWiki is the table 
for functional annotation using the Gene Ontology (GO) 
(26). This table allows users to add and correct GO 



annotations that we share with EcoCyc (8) and regular- 
ly deposit with the GO consortium (Figure 2). The Gene 
Product(s) page also includes sections for physical inter- 
actions, domains and motifs, structures, physical 
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Figure 2. Progress toward annotation of E. coli genes. EcoliWiki and EcoCyc are collaborating to improve how the experimental literature is 
captured in GO annotations (8). Complete annotation of a gene is defined as at least one literature-based annotation to each of the three ontologies: 
molecular function, cellular component and biological process; unreviewed computational annotations (those with the IEA evidence code) are 
omitted. The gene association file with current annotations is available at http://www.geneontology.org/GO.downloads.annotations.shtml. 



properties and links to information at other sites. Each of 
these is seeded and updated with information that users 
can elaborate via community annotation. A section is 
provided to allow users to place a Jmol structure viewer 
(http://www.jmol.org/) on the product page. EcoliWiki 
includes instructions for how to use Jmol to make 
Proteopedia-style (27) links that launch scripts to custom- 
ize the structure image. 

The Expression page contains information about gene 
expression and regulation. Transcription units are shown 
with an image of the operon provided by RegulonDB (1). 
The Expression page includes a table for quantitative data 
about the level, synthesis and degradation of the gene 
product and/or its mRNA. The Evolution page has infor- 
mation about homologs and includes links to sequence 
comparison and ortholog databases such as BLAST 
(16), InParanoid (28), CDD (29) and YOGY (30). 

Literature 

A common concern about wiki-based content is its cred- 
ibility. An important way of improving the credibility 
of community-provided content is to provide a robust 
system for linking assertions to the peer-reviewed primary 
literature. EcoliWiki uses a modified version of the Cite 
extension used by Wikipedia. Our modified version recog- 
nizes 'ref tags containing PubMed identifers in the format 
'<ref name = , PMID:#7>'. EcoliWiki uses these tags 
to generate in-place numbered citation links and a refer- 
ence list with bibliographic information provided by 
NCBI's E-Utilities (31). The reference also generates a 
link to create a page about the cited article in EcoliWiki. 
This reference page is named for the PMID and is seeded 
with information from PubMed including the abstract, 



links to the full-text article (if available) and a table for 
GO annotations relevant to that publication. Reference 
pages allow for a community-driven equivalent of the 
curated papers functionality of other model organism 
databases. 

Other page types 

Table 2 lists major page types in EcoliWiki. EcoliWiki 
aims to provide information about any topic of interest 
to scientists working with laboratory E. coli, its phage, 
plasmids and mobile elements, either from interest in the 
E. coli's fundamental biology or simply as a biotechnology 
tool. Thus, we have pages for cloning vectors, strains, 
methods and other online resources. EcoliWiki generates 
pages for any GO term used in an annotation; these can be 
used for information about how a particular process or 
complex is used in the organisms covered by EcoliWiki. 

One of the most important aspects of wikis is how they 
allow users to create new pages. EcoliWiki has a system of 
form-based entry that will create several page types based 
on internal templates. Users can also create pages that do 
not necessarily fit into any predefined type. 



INTEGRATION WITH INTERNAL AND EXTERNAL 
TOOLS 

One of the early design decisions for EcoliWiki was 
whether or not to build an independent website or incorp- 
orate the desired E. coli content into Wikipedia, as has 
been done by the GeneWiki project (32). Although other 
factors contributed to our decision to build a stand-alone 
website, a major benefit is in how it allows us to customize 
the wiki to handle complex data in structured tables. The 
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Table 2. Major types of pages in EcoliWiki 
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division of pages into sections of tables and notes makes 
EcoliWiki more complex than some biological wikis, and 
it has been criticized for that complexity (7). However, this 
structure provides us with the ability to retrieve data from 
the wiki without the need for natural language processing 
to extract specific types of information from free text. This 
allows us to integrate EcoliWiki content with internal and 
external tools in ways that would otherwise be much more 
difficult (Supplementary Data: Technical and Table SI). 

Within the wiki itself, data from tables on other pages is 
used in the Quickview table. The Quickview table not only 
looks up nomenclature from the Gene and Gene 
Product (s) pages, but also uses coordinates and sequences 
to provide data to linked tools for displaying the sequence 
or viewing alternative reading frames using EMBOSS 
(33), designing PCR primers with GBrowse (25) and 
Primer3 (34), and engineering silent restriction sites (35). 
On the Gene Product (s) page, a graphical display of the 
locations of domains and mutations is generated from in- 
formation in two different tables on the Gene and Gene 
Product (s) pages. 

As noted above, EcoliWiki Gene pages incorporate 
thumbnail images of local genome context using our in- 
stallation of GBrowse (25). Our instance of GBrowse 
provides genome browsing for all the E. coli and phage 
genomes in EcoliWiki, and for several plasmids and 
mobile elements. We have added several custom tracks 
to the E. coli MG1655 GBrowse, including transcription 
units from RegulonDB (1), rRNA operons, cryptic 
prophage, repetitive elements and others. We have been 
adding tracks for data from ChlP-chip studies available 
from public repositories or from authors (36 4- 1 ). These 
tracks link to EcoliWiki reference pages for more detailed 
descriptions of the experiments and data processing. 

EcoliWiki has implemented a modified version of 
Textpresso (42), a full-text literature search engine. Our 
Textpresso periodically updates its corpus from reference 
information added to EcoliWiki. In 2011, EcoliWiki 



deployed a tool that allows users to search fitness data 
and correlations from a large-scale phenotyping study pub- 
lished by Nichols et al. (43). This tool uses EcoliWiki to 
match gene names and synonyms to genes used in the 
study. 

Structured data allows EcoliWiki to provide web 
services for the integrated PortEco search of different 
E. coli resources. EcoliWiki web services can also be 
used to identify pages that have been edited in a specified 
date range. 



DISCUSSION 

EcoliWiki and other wiki-based resources show that wiki 
software can be adapted for many of the purposes of an 
online model organism database. EcoliWiki adds other 
freely available tools such as GBrowse and Jmol, and 
our own open-source development to extend the 
capabilities of the basic Mediawiki platform. 

Both web analytics (Figure 3) and anecdotal feedback 
from users suggest that EcoliWiki is successful in the sense 
that it is widely used by our target community, and is 
viewed as a useful resource. As a community annotation 
system, however, our goals also include encouraging users 
to contribute to the content. Any user can view content on 
the EcoliWiki website; however, users must register in 
order to create or edit content on the site. Although the 
requirement for an account presents an unfortunate disin- 
centive for community participation, requiring account 
creation is a relatively quick and simple method to avoid 
spam and vandalism. We use a 'vampire model' for user 
registration, where any registered user can create new 
users. 

Non-staff users have contributed 1513 edits to 485 
pages since its rollout in 2007 as of 10 August 2011. 
While this is encouraging, the majority of the manually 
curated content in EcoliWiki is still generated by members 
of the EcoliWiki project. Only a small fraction of the user 
base has contributed content. EcoliWiki is viewed by sev- 
eral thousand users who are identified as repeat visitors 
(e.g. 6668 visited at least 5 times between 11 August 2010 
and 10 August 2011), but there are only 710 non-staff 
registered users. Of these, only 170 have edited 
EcoliWiki, and many of these have only edited it once 
(Figure 3). 

Increased user participation is needed to realize one of 
the basic ideas of the wiki model: that quality is improved 
by multiple users reviewing and refining the same content. 
To further encourage community participation, we display 
contributor usernames for each page on the sidebar in 
addition to the standard display of editors in the page 
history. We also promote editing the wiki in workshops 
and through email contact with authors of recent papers. 
Nevertheless, in the absence of other incentives, the low 
editing participation is consistent with what is seen in 
other resources built on voluntary collaboration, includ- 
ing Wikipedia (http://stats.wikimedia.org/EN/Sitemap. 
htm). Wikipedia overcomes this by having such a large 
user base that even a small fraction of users are sufficient 
to create and improve millions of pages of content. 
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Figure 3. (A) Visitor statistics from Google Analytics for 11 August 2010-10 August 2011. The numbers of visits that are the n-th visit for a given 
user indicate users who return to EcoliWiki multiple times. Note that the number of users who visited n times is included in the count for users who 
visited fewer than n times. Thus, each bar is for the number of users who visited n times or more. (B) Distribution of user contributions to EcoliWiki, 
excluding project employees. 



Increasing the participation in editing will require new 
ways to make editing align with the other incentives that 
determine how scientists allocate their scarce resources of 
time and energy. One potential mechanism is to encourage 
academic scientists to incorporate editing EcoliWiki into 
teaching. A section of the wiki on Educational Resources 
can be used to share materials and methods for this pur- 
pose. We have also developed wiki extensions to allow 
instructors to more easily track where their students 
have edited EcoliWiki. 

Future directions for EcoliWiki will also focus on 
increasing its utility through further integration with 



other tools via the PortEco project (http://porteco.org). 
In particular, we are working to improve the connections 
with our PortEco partners, EcoCyc (3), the PortEco 
instance of the Stanford Microarray Database (44) and 
PANTHER (45) in ways that optimize their synergy with 
wikis. For example, while it would not be appropriate to 
have community editing of data from a published tran- 
scriptome experiment, community curation of the meta- 
data associated with that experiment can be valuable for 
analyses across experiments from different labs. Similarly, 
while it might be difficult to devise a system for direct 
community editing of phylogenetic trees, community 
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commentary could enrich the evaluation of hypotheses 
generated from phylogenetic inference. 

DATABASE AVAILABILITY 

EcoliWiki is freely available via the EcoliWiki website 
(http://ecoliwiki.net). 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Methods. 
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